Overview

Dataset statistics

Number of variables29
Number of observations1784
Missing cells2
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory172.6 KiB
Average record size in memory99.1 B

Variable types

BOOL19
NUM8
CAT2

Warnings

suburb has a high cardinality: 256 distinct values High cardinality
parking is highly skewed (γ1 = 20.56170848) Skewed
Unnamed: 0 has unique values Unique
ID has unique values Unique
bedroom has 145 (8.1%) zeros Zeros
bathroom has 114 (6.4%) zeros Zeros
garage has 826 (46.3%) zeros Zeros
parking has 1318 (73.9%) zeros Zeros

Reproduction

Analysis started2020-09-25 11:36:13.811550
Analysis finished2020-09-25 11:36:28.571168
Duration14.76 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIQUE

Distinct1784
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean891.5
Minimum0
Maximum1783
Zeros1
Zeros (%)0.1%
Memory size13.9 KiB
2020-09-25T14:36:28.670944image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile89.15
Q1445.75
median891.5
Q31337.25
95-th percentile1693.85
Maximum1783
Range1783
Interquartile range (IQR)891.5

Descriptive statistics

Standard deviation515.1407575
Coefficient of variation (CV)0.577835959
Kurtosis-1.2
Mean891.5
Median Absolute Deviation (MAD)446
Skewness0
Sum1590436
Variance265370
MonotocityStrictly increasing
2020-09-25T14:36:28.810301image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
178310.1%
 
119610.1%
 
117410.1%
 
117610.1%
 
117810.1%
 
118010.1%
 
118210.1%
 
118410.1%
 
118610.1%
 
118810.1%
 
Other values (1774)177499.4%
 
ValueCountFrequency (%) 
010.1%
 
110.1%
 
210.1%
 
310.1%
 
410.1%
 
ValueCountFrequency (%) 
178310.1%
 
178210.1%
 
178110.1%
 
178010.1%
 
177910.1%
 

suburb
Categorical

HIGH CARDINALITY

Distinct256
Distinct (%)14.3%
Missing0
Missing (%)0.0%
Memory size13.9 KiB
Sea Po
 
80
Plattekloof
 
63
Camps Bay
 
58
Constantia Upper
 
55
Baronetcy Estate
 
50
Other values (251)
1478 
ValueCountFrequency (%) 
Sea Po804.5%
 
Plattekloof633.5%
 
Camps Bay583.3%
 
Constantia Upper553.1%
 
Baronetcy Estate502.8%
 
Claremont Upper502.8%
 
Foreshore462.6%
 
Big Bay442.5%
 
ondebosch422.4%
 
Cape Town Central412.3%
 
Other values (246)125570.3%
 
2020-09-25T14:36:29.154425image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique95 ?
Unique (%)5.3%
2020-09-25T14:36:29.304151image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length45
Median length12
Mean length13.40919283
Min length2

bedroom
Real number (ℝ≥0)

ZEROS

Distinct14
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.942825112
Minimum0
Maximum13
Zeros145
Zeros (%)8.1%
Memory size13.9 KiB
2020-09-25T14:36:29.419488image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q34
95-th percentile6
Maximum13
Range13
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.681306305
Coefficient of variation (CV)0.571323895
Kurtosis2.189194335
Mean2.942825112
Median Absolute Deviation (MAD)1
Skewness0.7259493596
Sum5250
Variance2.826790893
MonotocityNot monotonic
2020-09-25T14:36:29.530196image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%) 
345625.6%
 
244424.9%
 
433718.9%
 
51679.4%
 
01458.1%
 
11397.8%
 
6502.8%
 
7221.2%
 
8120.7%
 
1050.3%
 
Other values (4)70.4%
 
ValueCountFrequency (%) 
01458.1%
 
11397.8%
 
244424.9%
 
345625.6%
 
433718.9%
 
ValueCountFrequency (%) 
1310.1%
 
1210.1%
 
1110.1%
 
1050.3%
 
940.2%
 

bathroom
Real number (ℝ≥0)

ZEROS

Distinct19
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.337163677
Minimum0
Maximum13
Zeros114
Zeros (%)6.4%
Memory size13.9 KiB
2020-09-25T14:36:29.637987image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile5
Maximum13
Range13
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.540069786
Coefficient of variation (CV)0.6589481948
Kurtosis2.914619264
Mean2.337163677
Median Absolute Deviation (MAD)1
Skewness1.238854569
Sum4169.5
Variance2.371814946
MonotocityNot monotonic
2020-09-25T14:36:29.753681image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%) 
252929.7%
 
143024.1%
 
322512.6%
 
41267.1%
 
01146.4%
 
2.5884.9%
 
5794.4%
 
3.5563.1%
 
4.5372.1%
 
1.5281.6%
 
Other values (9)724.0%
 
ValueCountFrequency (%) 
01146.4%
 
143024.1%
 
1.5281.6%
 
252929.7%
 
2.5884.9%
 
ValueCountFrequency (%) 
1310.1%
 
1020.1%
 
940.2%
 
840.2%
 
7.540.2%
 

garage
Real number (ℝ≥0)

ZEROS

Distinct11
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.117152466
Minimum0
Maximum12
Zeros826
Zeros (%)46.3%
Memory size13.9 KiB
2020-09-25T14:36:29.853510image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum12
Range12
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.302423457
Coefficient of variation (CV)1.165842171
Kurtosis4.706449638
Mean1.117152466
Median Absolute Deviation (MAD)1
Skewness1.473306876
Sum1993
Variance1.696306862
MonotocityNot monotonic
2020-09-25T14:36:29.955279image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%) 
082646.3%
 
250028.0%
 
126114.6%
 
31086.1%
 
4653.6%
 
6110.6%
 
590.5%
 
1210.1%
 
1010.1%
 
810.1%
 
ValueCountFrequency (%) 
082646.3%
 
126114.6%
 
250028.0%
 
31086.1%
 
4653.6%
 
ValueCountFrequency (%) 
1210.1%
 
1010.1%
 
810.1%
 
710.1%
 
6110.6%
 

parking
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct12
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5168161435
Minimum0
Maximum60
Zeros1318
Zeros (%)73.9%
Memory size13.9 KiB
2020-09-25T14:36:30.055125image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum60
Range60
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.829142709
Coefficient of variation (CV)3.539252269
Kurtosis632.7535455
Mean0.5168161435
Median Absolute Deviation (MAD)0
Skewness20.56170848
Sum922
Variance3.345763049
MonotocityNot monotonic
2020-09-25T14:36:30.151870image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
0131873.9%
 
125714.4%
 
21438.0%
 
3271.5%
 
4181.0%
 
870.4%
 
660.3%
 
1040.2%
 
6010.1%
 
1510.1%
 
Other values (2)20.1%
 
ValueCountFrequency (%) 
0131873.9%
 
125714.4%
 
21438.0%
 
3271.5%
 
4181.0%
 
ValueCountFrequency (%) 
6010.1%
 
1510.1%
 
1210.1%
 
1040.2%
 
870.4%
 

erfSize
Real number (ℝ≥0)

Distinct764
Distinct (%)42.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean970.2300448
Minimum38
Maximum32397
Zeros0
Zeros (%)0.0%
Memory size13.9 KiB
2020-09-25T14:36:30.281526image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum38
5-th percentile190.6
Q1595
median715
Q3845.25
95-th percentile2044.1
Maximum32397
Range32359
Interquartile range (IQR)250.25

Descriptive statistics

Standard deviation1732.925427
Coefficient of variation (CV)1.786097469
Kurtosis144.9050997
Mean970.2300448
Median Absolute Deviation (MAD)120
Skewness10.76837172
Sum1730890.4
Variance3003030.537
MonotocityNot monotonic
2020-09-25T14:36:30.425139image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
71569939.2%
 
496130.7%
 
49590.5%
 
59580.4%
 
18080.4%
 
100070.4%
 
120050.3%
 
16050.3%
 
100440.2%
 
65240.2%
 
Other values (754)102257.3%
 
ValueCountFrequency (%) 
3810.1%
 
4310.1%
 
5810.1%
 
8010.1%
 
9010.1%
 
ValueCountFrequency (%) 
3239710.1%
 
2466110.1%
 
2400620.1%
 
2209810.1%
 
2118810.1%
 

buildingSize
Real number (ℝ≥0)

Distinct397
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean198.6199552
Minimum14
Maximum2400
Zeros0
Zeros (%)0.0%
Memory size13.9 KiB
2020-09-25T14:36:30.577780image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile50
Q1100.75
median145
Q3206.5
95-th percentile600
Maximum2400
Range2386
Interquartile range (IQR)105.75

Descriptive statistics

Standard deviation180.8375422
Coefficient of variation (CV)0.9104701593
Kurtosis17.4043728
Mean198.6199552
Median Absolute Deviation (MAD)49.5
Skewness3.115866589
Sum354338
Variance32702.21667
MonotocityNot monotonic
2020-09-25T14:36:30.730323image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14557132.0%
 
55201.1%
 
400150.8%
 
78140.8%
 
300140.8%
 
60130.7%
 
45120.7%
 
81120.7%
 
200110.6%
 
72110.6%
 
Other values (387)109161.2%
 
ValueCountFrequency (%) 
1410.1%
 
2010.1%
 
2210.1%
 
2610.1%
 
2710.1%
 
ValueCountFrequency (%) 
240010.1%
 
118610.1%
 
110030.2%
 
109110.1%
 
108010.1%
 

ID
Categorical

UNIQUE

Distinct1784
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size13.9 KiB
H_1135
 
1
H_1405
 
1
H_373
 
1
H_8
 
1
H_1400
 
1
Other values (1779)
1779 
ValueCountFrequency (%) 
H_113510.1%
 
H_140510.1%
 
H_37310.1%
 
H_810.1%
 
H_140010.1%
 
H_14710.1%
 
H_107010.1%
 
H_111810.1%
 
H_159110.1%
 
H_162810.1%
 
Other values (1774)177499.4%
 
2020-09-25T14:36:30.897876image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1784 ?
Unique (%)100.0%
2020-09-25T14:36:31.040492image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length5
Mean length5.379484305
Min length3

price
Real number (ℝ≥0)

Distinct587
Distinct (%)32.9%
Missing2
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean6732706.833
Minimum269000
Maximum49990000
Zeros0
Zeros (%)0.0%
Memory size13.9 KiB
2020-09-25T14:36:31.168602image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum269000
5-th percentile965250
Q12252750
median3950000
Q37950000
95-th percentile24688500
Maximum49990000
Range49721000
Interquartile range (IQR)5697250

Descriptive statistics

Standard deviation7597931.111
Coefficient of variation (CV)1.128510612
Kurtosis6.925952712
Mean6732706.833
Median Absolute Deviation (MAD)2155000
Skewness2.481251198
Sum1.199768358e+10
Variance5.772855716e+13
MonotocityNot monotonic
2020-09-25T14:36:31.319243image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3995000211.2%
 
2950000181.0%
 
4950000181.0%
 
2850000171.0%
 
2495000171.0%
 
1295000171.0%
 
2750000160.9%
 
2995000160.9%
 
1350000150.8%
 
3200000140.8%
 
Other values (577)161390.4%
 
ValueCountFrequency (%) 
26900010.1%
 
32000010.1%
 
35000010.1%
 
40000010.1%
 
47500010.1%
 
ValueCountFrequency (%) 
4999000010.1%
 
4900000020.1%
 
4750000010.1%
 
4500000030.2%
 
4350000010.1%
 
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1765 
1
 
19
ValueCountFrequency (%) 
0176598.9%
 
1191.1%
 
2020-09-25T14:36:31.426094image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1783 
1
 
1
ValueCountFrequency (%) 
0178399.9%
 
110.1%
 
2020-09-25T14:36:31.470985image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1780 
1
 
4
ValueCountFrequency (%) 
0178099.8%
 
140.2%
 
2020-09-25T14:36:31.515854image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1783 
1
 
1
ValueCountFrequency (%) 
0178399.9%
 
110.1%
 
2020-09-25T14:36:31.556751image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1147 
1
637 
ValueCountFrequency (%) 
0114764.3%
 
163735.7%
 
2020-09-25T14:36:31.599630image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1780 
1
 
4
ValueCountFrequency (%) 
0178099.8%
 
140.2%
 
2020-09-25T14:36:31.641562image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1783 
1
 
1
ValueCountFrequency (%) 
0178399.9%
 
110.1%
 
2020-09-25T14:36:31.685400image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1778 
1
 
6
ValueCountFrequency (%) 
0177899.7%
 
160.3%
 
2020-09-25T14:36:31.727295image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1780 
1
 
4
ValueCountFrequency (%) 
0178099.8%
 
140.2%
 
2020-09-25T14:36:31.769176image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1783 
1
 
1
ValueCountFrequency (%) 
0178399.9%
 
110.1%
 
2020-09-25T14:36:31.811068image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1783 
1
 
1
ValueCountFrequency (%) 
0178399.9%
 
110.1%
 
2020-09-25T14:36:31.853955image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1781 
1
 
3
ValueCountFrequency (%) 
0178199.8%
 
130.2%
 
2020-09-25T14:36:31.894845image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1772 
1
 
12
ValueCountFrequency (%) 
0177299.3%
 
1120.7%
 
2020-09-25T14:36:31.936745image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
1
959 
0
825 
ValueCountFrequency (%) 
195953.8%
 
082546.2%
 
2020-09-25T14:36:31.977622image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1701 
1
 
83
ValueCountFrequency (%) 
0170195.3%
 
1834.7%
 
2020-09-25T14:36:32.019506image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1782 
1
 
2
ValueCountFrequency (%) 
0178299.9%
 
120.1%
 
2020-09-25T14:36:32.061400image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1780 
1
 
4
ValueCountFrequency (%) 
0178099.8%
 
140.2%
 
2020-09-25T14:36:32.103282image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1782 
1
 
2
ValueCountFrequency (%) 
0178299.9%
 
120.1%
 
2020-09-25T14:36:32.144181image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
0
1744 
1
 
40
ValueCountFrequency (%) 
0174497.8%
 
1402.2%
 
2020-09-25T14:36:32.186066image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Interactions

2020-09-25T14:36:18.513628image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:18.656235image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:18.785888image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:18.907565image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:19.028203image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:19.150874image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:19.285514image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:19.421190image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:19.549809image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:19.674510image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:19.799106image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:19.947774image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:20.076404image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:20.205048image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:20.335068image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:20.457739image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:20.586359image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:20.736956image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:20.868622image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:20.991877image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:21.115543image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:21.233370image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:21.496709image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:21.631342image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:21.759035image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:21.892682image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:22.024330image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:22.150992image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:22.275655image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:22.409295image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:22.542938image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:22.670564image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:22.808786image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:22.954459image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:23.084697image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:23.208404image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:23.332054image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:23.459729image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:23.626279image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:23.809788image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:23.938948image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:24.073709image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:24.199374image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:24.328064image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:24.458774image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:24.588391image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:24.715051image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:24.839760image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:24.961392image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:25.085061image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:25.198799image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:25.313491image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:25.432224image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:25.552935image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:25.676740image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:25.797379image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:25.918092image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:26.043759image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:26.165663image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:26.285326image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:26.404012image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:26.524326image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:26.647960image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:26.767637image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-09-25T14:36:32.299760image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-25T14:36:32.685805image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-25T14:36:33.076759image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-25T14:36:33.459178image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-09-25T14:36:27.077848image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:28.097554image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-25T14:36:28.343465image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

Unnamed: 0suburbbedroombathroomgarageparkingerfSizebuildingSizeIDpricepropertyType_ApartmentpropertyType_GuesthousepropertyType_HousepropertyType_ParkpropertyType_apartmentpropertyType_auctionpropertyType_breakfastpropertyType_buspropertyType_cottagepropertyType_farmpropertyType_flatspropertyType_guesthousepropertyType_homepropertyType_housepropertyType_landpropertyType_loftpropertyType_officepropertyType_propertypropertyType_townhouse
00Clifton33.000715.0310.0H_149990000.00000100000000000000
11Constantia Upper77.0307555.0145.0H_249000000.00000000000000100000
22Bantry Bay33.520626.0145.0H_349000000.00000000000000100000
33Fresnaye55.0441044.0900.0H_447500000.00000000000000100000
44Bantry Bay33.020715.0546.0H_545000000.00000100000000000000
55Waterfront (Cape Town)33.503715.0491.0H_645000000.00000100000000000000
66Mouille Po42.021715.0261.0H_745000000.00000100000000000000
77Constantia Upper54.0304215.0530.0H_843500000.00000000000000100000
88Constantia Upper57.04108210.0145.0H_942000000.00000000000000100000
99Waterfront (Cape Town)33.000715.0216.0H_1041400000.00000100000000000000

Last rows

Unnamed: 0suburbbedroombathroomgarageparkingerfSizebuildingSizeIDpricepropertyType_ApartmentpropertyType_GuesthousepropertyType_HousepropertyType_ParkpropertyType_apartmentpropertyType_auctionpropertyType_breakfastpropertyType_buspropertyType_cottagepropertyType_farmpropertyType_flatspropertyType_guesthousepropertyType_homepropertyType_housepropertyType_landpropertyType_loftpropertyType_officepropertyType_propertypropertyType_townhouse
17741774Woodlands (Mitchells Pla22.00090.0145.0H_1775500000.00000000000000100000
17751775Kalkfonte00.000160.0145.0H_1776499000.00010000000000000000
17761776Khayelitsha31.010141.0145.0H_1777495000.00000000000000100000
17771777Tafelsig31.000144.0145.0H_1778480000.00000000000000100000
17781778Eastridge31.000280.0145.0H_1779480000.00000000000000100000
17791779Maitland11.001715.026.0H_1780475000.00000000000000100000
17801780Kle00.000429.0145.0H_1781400000.00000000000000010000
17811781Mandela Park21.000108.0145.0H_1782350000.00000000000000100000
17821782Khayelitsha21.00099.0145.0H_1783320000.00000000000000100000
17831783Mitchells Pla00.000715.066.0H_1784269000.00000000000000000100